Table 7: the Hebrew{latin Transliteration

نویسندگان

  • Robert L. Mercer
  • Sadaoki Furui
چکیده

Following is the set of rules used for Hebrew in order to automatically generate the SW set for every morphological analysis in Hebrew. Note that in case an analysis includes a particular attached particle, this particle is also attached to each of its similar words: 1. A deenite form of a noun { the SW set includes the indeenite form of the same noun. 2. An indeenite form of a noun { the deenite form of the same noun. 3. A noun with a possessive pronoun { the same noun with all the other possessive pronouns with the same person attribute. 4. An adjective { the other forms of the same adjective (changing the gender and number attributes). 5. A verb without an object pronoun { the same verb in the same tense and person (changing the gender and number attributes only). 6. A verb with an object pronoun { the same verb form with all the other object pronouns forms (preserving the person attribute while changing the gender and number ones). 7. Nominal personal pronoun { the other nominal personal pronouns of the same person. 8. A masculine form of a number { the feminine form of the same number. 9. A feminine form of a number { the masculine form of the same number. 10. A proper noun, a particle (preposition, connective etc.) { the empty SW set. 21 Appendix A: Given below is the Latin{Hebrew transliteration used throughout the paper. Note that accepted transcriptions for Hebrew (Academy of The Hebrew Language, 1957; Ornan, 1994) include indication for the vowels which are missing in the modern Hebrew writing system. For this reason, these transcriptions are not suitable for demonstrating the morphological ambiguity problem in the language. Instead, we use the following transliteration which is based on the phonemic script (Ornan, 1994): 20 Dagan, Ido and Alon Itai. 1994. Word sense disambiguation using a second language monolingual corpus. Word sense disambiguation using statistical models of Roget's categories trained on large corpora. In Proc. of COLING, pages 454{460. 19 The results of the experiment connrm the conjecture we made about the nature of the morphological ambiguity problem in Hebrew. It can be argued, therefore, that the computer with its complete morphological knowledge, is facing a much more complex problem than that of a human reading a Hebrew text who may be ignorant of some rare analyses. This observation is also …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Sideways Transliteration: How to Transliterate Multicultural Person Names?

In a global setting, texts contain transliterated names from many cultural origins. Correct transliteration depends not only on target and source languages but also, on the source language of the name. We introduce a novel methodology for transliteration of names originating in different languages using only monolingual resources. Our method is based on a step of noisy transliteration and then ...

متن کامل

7-bit Meta-Transliterations for 8-bit Romanizations

[7-bit encoding, transliteration] We propose a general strategy for deriving 7-bit encodings for texts in languages which use an alphabetic non-Roman script, like Arabic, Persian, Sanskrit and many other Indic scripts, and for which there is some transliteration convention using Roman letters with additional diacritical marks. These schemes, which we will call \meta-transliterations", are based...

متن کامل

Lightly Supervised Transliteration for Machine Translation

We present a Hebrew to English transliteration method in the context of a machine translation system. Our method uses machine learning to determine which terms are to be transliterated rather than translated. The training corpus for this purpose includes only positive examples, acquired semi-automatically. Our classifier reduces more than 38% of the errors made by a baseline method. The identif...

متن کامل

Unsupervised Constraint Driven Learning For Transliteration Discovery

This paper introduces a novel unsupervised constraint-driven learning algorithm for identifying named-entity (NE) transliterations in bilingual corpora. The proposed method does not require any annotated data or aligned corpora. Instead, it is bootstrapped using a simple resource – a romanization table. We show that this resource, when used in conjunction with constraints, can efficiently ident...

متن کامل

Understanding the Beginning of Genesis : Just How Many

and Introduction The first word of Genesis, as traditionally pronounced, means literally “In a beginning”. However, tradition has it meaning and translated as “In the beginning”. The literal meaning is considered as contradicting reality. Therefore, Rashi attempted a syntactic solution to resurrect the traditional meaning. However, this syntactic solution requires a change in the pronunciation ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995